NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Improving Neural Biasing for Contextual Speech Recognition by Early Context Injection and Text Perturbation

Huang, Ruizhe; Yarmohammadi, Mahsa; Khudanpur, Sanjeev; Povey, Daniel (September 2024, Interspeech)

Existing research suggests that automatic speech recognition (ASR) models can benefit from additional contexts (e.g., contact lists, user specified vocabulary). Rare words and named entities can be better recognized with contexts. In this work, we propose two simple yet effective techniques to improve context-aware ASR models. First, we inject contexts into the encoders at an early stage instead of merely at their last layers. Second, to enforce the model to leverage the contexts during training, we perturb the reference transcription with alternative spellings so that the model learns to rely on the contexts to make correct predictions. On LibriSpeech, our techniques together reduce the rare word error rate by 60% and 25% relatively compared to no biasing and shallow fusion, making the new state-of-the-art performance. On SPGISpeech and a real-world dataset ConEC, our techniques also yield good improvements over the baselines.
more » « less
Full Text Available
Enhancing Neural Transducer for Multilingual ASR with Synchronized Language Diarization

https://doi.org/10.21437/Interspeech.2024-1418

Hussein, Amir; Raj, Desh; Wiesner, Matthew; Povey, Daniel; Garcia, Paola; Khudanpur, Sanjeev (September 2024, ISCA)

Full Text Available
On Speaker Attribution with SURT

https://doi.org/10.21437/odyssey.2024-14

Raj, Desh; Wiesner, Matthew; Maciejewski, Matthew; Garcia, Paola; Povey, Daniel; Khudanpur, Sanjeev (June 2024, ISCA)

Full Text Available
Enhancing End-to-End Conversational Speech Translation Through Target Language Context Utilization

https://doi.org/10.1109/ICASSP48485.2024.10446102

Hussein, Amir; Yan, Brian; Anastasopoulos, Antonios; Watanabe, Shinji; Khudanpur, Sanjeev (April 2024, IEEE)

Full Text Available
ConEC: Earnings Call Dataset with Real-world Contexts for Benchmarking Contextual Speech Recognition

Huang, Ruizhe; Yarmohammadi, Mahsa; Trmal, Jan; Liu, Jing; Raj, Desh; Paola_Garcia, Leibny; Ivanov, Alexei V; Ehlen, Patrick; Yu, Mingzhi; Povey, Dan; et al (May 2024, Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024))
Calzolari, Nicoletta; Kan, Min-Yen; Hoste, Veronique; Lenci, Alessandro; Sakti, Sakriani; Xue, Nianwen (Ed.)
Knowing the particular context associated with a conversation can help improving the performance of an automatic speech recognition (ASR) system. For example, if we are provided with a list of in-context words or phrases — such as the speaker’s contacts or recent song playlists — during inference, we can bias the recognition process towards this list. There are many works addressing contextual ASR; however, there is few publicly available real benchmark for evaluation, making it difficult to compare different solutions. To this end, we provide a corpus (“ConEC”) and baselines to evaluate contextual ASR approaches, grounded on real-world applications. The ConEC corpus is based on public-domain earnings calls (ECs) and associated supplementary materials, such as presentation slides, earnings news release as well as a list of meeting participants’ names and affiliations. We demonstrate that such real contexts are noisier than artificially synthesized contexts that contain the ground truth, yet they still make great room for future improvement of contextual ASR technology.
more » « less
Full Text Available
ConEC: Earnings Call Dataset with Real-world Contexts for Benchmarking Contextual Speech Recognition

Huang, Ruizhe; Yarmohammadi, Mahsa; Trmal, Jan; Liu, Jing; Raj, Desh; Garcia, Leibny P; Ivanov, Alexei; Ehlen, Patrick; Yu, Mingzhi; Povey, Dan; et al (May 2024, ELRA and ICCL)

Knowing the particular context associated with a conversation can help improving the performance of an automatic speech recognition (ASR) system. For example, if we are provided with a list of in-context words or phrases — such as the speaker’s contacts or recent song playlists — during inference, we can bias the recognition process towards this list. There are many works addressing contextual ASR; however, there is few publicly available real benchmark for evaluation, making it difficult to compare different solutions. To this end, we provide a corpus (“ConEC”) and baselines to evaluate contextual ASR approaches, grounded on real-world applications. The ConEC corpus is based on public-domain earnings calls (ECs) and associated supplementary materials, such as presentation slides, earnings news release as well as a list of meeting participants’ names and affiliations. We demonstrate that such real contexts are noisier than artificially synthesized contexts that contain the ground truth, yet they still make great room for future improvement of contextual ASR technology
more » « less
Full Text Available
Learning From Flawed Data: Weakly Supervised Automatic Speech Recognition

https://doi.org/10.1109/ASRU57964.2023.10389684

Gao, Dongji; Xu, Hainan; Raj, Desh; Perera, Leibny_Paola Garcia; Povey, Daniel; Khudanpur, Sanjeev (December 2023, IEEE)
NA (Ed.)
Full Text Available
GPU-accelerated Guided Source Separation for Meeting Transcription

https://doi.org/10.21437/Interspeech.2023-42

Raj, Desh; Povey, Daniel; Khudanpur, Sanjeev (August 2023, Proc. INTERSPEECH 2023)

Full Text Available
Bypass Temporal Classification: Weakly Supervised Automatic Speech Recognition with Imperfect Transcripts

https://doi.org/10.21437/Interspeech.2023-2258

Gao, Dongji; Wiesner, Matthew; Xu, Hainan; Garcia, Leibny Paola; Povey, Daniel; Khudanpur, Sanjeev (August 2023, Proc. Interspeech 2023)

Full Text Available
SURT 2.0: Advances in Transducer-Based Multi-Talker Speech Recognition

https://doi.org/10.1109/TASLP.2023.3318398

Raj, Desh; Povey, Daniel; Khudanpur, Sanjeev (January 2023, IEEE/ACM Transactions on Audio, Speech, and Language Processing)

Full Text Available

« Prev Next »

Search for: All records